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EVALUATION AND SELECTION OF PROGRAMMING CODE 
An embodiment of the invention is directed to the generation and 

optimization of computer programming code. Other embodiments are also 

described. 

BACKGROUND 

When an application program is launched or run in a computer, the 
computer is executing what is referred to as a binary image (or simply, binary) 
oftheprogram. That is no,, however, the version in which the program was 
originally created by its author. Due to the inherent design and complexity of a 
computer, programs are written using a higher level programming language 
thatismorereadilyunderstandabletoahumanprogram^er. Aprogramrs 
initially written in what is called a source programming language (resulung m 
source code or a source file). It is then translated down into the binary image 
version (also referred to as the executable or executable file) before being 
loaded into the computer's memory for execution. Software programs or tools, 
referred to collectively here as code generators, are used by the programmer to 
perform mis translation. A code generator is selected that is able to translate a 
particular source file into an executable Me that is to be run on a grven 
computer hardware platform (,*., one that isbased on a Pentium* processor 
by Intel Corp., Santa Clara, California). 

A code generator may have the following components. A compiler 
translates one or more input source files that are written in a high level 
language (*.*., C; C ++ ; Fortran; Pascal; Basic; as well as others) into object code 
or object files which are in a low level language referred to as machme 
language. Next, a linker joins the object files, together with library object fries 
*athavebeenprevious.ycompiled,in,oabinaryimage(the executable hie). 

The binary may then beloaded into the main memoryof the computer and 
executed by one or more of its processors. 



Modern integrated circuit technologies used in advanced computer 
components are being adopted at a rapid pace. Advances are being rapidly 
made in computer platform architectures, such as one based on a Penhum® 
processor, and new hardware components are being designed and 
manufactured that allow the same platform to be applied to different fields. 
These include, for example, personal computer (PC) desktops, laptops, home 
entertainment PCs, servers, home appliances, dedicated video game machmes, 
and mobUe held-held devices such as cellular telephones and muitifuncbon 
personal digital assistants (PDAs). Different fields, however, present different 
requirements for toe binaries that will be running on top of the hardware 
platform. For example, a program that is to run on a server is expected to have 
high performance while it is running, while programs that are for mobtle 
devices may have more stringent code size as weU as power consumphon 
constraints. In other instances, a program is to be stored in non-volatile, sohd 
state memory of the platform, which has even more stringent limits on storage 
space due to cost concerns. Such programs are sometimes referred to as 
firmware, and may need to be compressed, prior to being stored. 

Current code generation tools, including compilers, linkers, and binary 
optimizers, provide optimization controls that can be selected by the user in an 
effort to generate code that has a higher performance, smaller code stze, or 
lower power consumption. A binary optimizer, also sometimes referred * as a 
post-link optimizer, is a tool that is used to improve the performance of a 
program after it has been compiled and linked. The tool directly operates on 
the executable file and is thus said to rewrite the executable, in accordance with 
certain user specified optimization controls. Each of these tools may expose ,ts 
own set of optimization controls to the user. 

Current code generation tools, however, do not provide a systematic 
and automated approach to meet sophisticated code generation requirements. 
For example, the current tools do not allow the user to specify simultaneously 



:l&th a code size optimization setting, i.e. one that is expected to reduce the size 
of the binary, and a performance optimization setting, i.e., one that is expected 
to increase the performance of the binary. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The embodiments of the invention are illustrated by way of example 
and not by way of limitation in the figures of the accompanying drawings in 
which like references indicate similar elements. It should be noted that 
references to "an" embodiment of the invention in this disclosure are not 
necessarily to the same embodiment, and they mean at least one. 

Fig. 1 is a block diagram of a system for evaluating and selecting a 
binary based on its figure of merit, in accordance with an embodiment of the 
invention. 

Fig. 2 is a block diagram of an example system for generating the 



binaries. 



Fig. 3 shows another example system for generating the binaries. 

Fig. 4 is a flow diagram of a methodology for generating multiple 
binaries and then selecting or ranking them, in accordance with their figures of 



merit. 



Fig. 5 is a block diagram of a full featured system for code generation, 
evaluation, and selection, in accordance with an embodiment of the invention. 

Fig. 6 is a block diagram of a computer on which a software tool, in 
accordance with an embodiment of the invention, can run. 

DETAILED DESCRIPTION 

Fig. 1 is a block diagram of a system for evaluating and selecting a 
binary based on its figure of merit, in accordance with an embodiment of the 



^gntion. A first evaluator 104 measures a first characteristic of several input 
"binaries 106. The evaluator 104 computes a number of first figures of merit 
(FOMs) 108 for the input binaries 106. In other words, FOM1 is the computed 
figure of merit for binary 1, FOM2 is the computed figure of merit for binary 2, 
etc. In this example, there are four binaries 106 illustrated, however, there may 
be as few as two or more than four, depending on how fine-grained the 
available optimization controls are. Different techniques for generating the 
binaries 106 will be described below. 

Each of these input binaries 106 is generated with a different, code 
generator optimization setting, for the same processor instruction set 
architecture. The input binaries 106 may also be based on the same set of one 
or more source files (not shown). The binaries 106 may all be generated using 
the same code generator tool set, configured according to different 
optimization setting. The tool set may include components from different 
software vendors {e.g., a compiler and linker from one vendor, and a binary 
rewriter from another). Note that the input binaries 106 may be generated 
either manually by the user one at a time, or as described below automatically 
according to a script. 

The evaluator 104 in this example is to measure the performance of each 
input binary. This may be done by having the binary be executed by a 
hardware platform that implements the processor instruction set architecture 
for which the binary has been generated. Alternatively, the evaluator 104 may 
include a software simulation tool, which simulates the hardware platform, 
including the processor and I/O device resources that are present in the actual 
hardware platform. The binary is thus executed on its intended hardware 
platform, either actually or though simulation, and its performance is 
measured. Performance may be measured by feeding the running binary a 
predefined set of inputs and measuring how fast the expected outputs are 
produced. The measured performance is then translated into the FOM 108. 
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useful when the code generation requirements (of a particular field of use) call 
for a binary that has a maximum compressed file size. For example, the binary 
may include a firmware driver program that is to be stored in non-volatile, 
solid state memory of a computer {e.g., a basic I/O system, BIOS, a boot routine, 
or a network management program). 

The FOMs 120 are also fed to the binary selector 110 which compares the 
FOMs 120, while aiming at selecting the binary with the lowest or highest 
overall FOM. The "comparison" involving the FOMs 108 and FOMs 120 is 
broadly defined here, and may be implemented in several ways. As one 
example, an overall FOM is computed for each input binary 106, as a function 
of the FOM 108 and FOM 120. This may be a simple equation such as 

overall FOM1 = FOMl pe rf. + FOMlcompr.size (Equation 
2) 

overall FOM2 = FOM2 per f. + FOM2 C om P r.size 

In yet another alternative, the comparison amongst the different FOMs 
may use the concept of a vector for each binary. For example, 



overall FOM1 = square_root(FOMl 2 perf. + FOMl 2 com P r.size) (Equation 3) 
overall FOM2 = square_root (FOM2 2 pe rf. + FOM2 2 «,m P r.si 2 e) 

If the performance FOM is defined as above, namely, the greater the 
performance of a binary, the smaller its associated FOM 108, then the 
compressed size FOM should be defined so that the smaller the compressed file 
size of a binary, the smaller its associated FOM 120. This approach thus defines 
the "best" overall FOM as the one having the lowest value. 

Note that in the comparisons described above involving two measured 
characteristics, the equations for overall FOM weight the performance and 
compressed size FOMs equally. As an alternative, the equation may specify 
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j0* input The script processor 208 may then step through each subsequent 
optimization setting that is specified in the script 210, instructing the compiler 
204 and linker 206 to produce further binaries 106 one after the other (again 
based on the same source program 202). The system thus automatically 
executes the wishes specified in the script 210, and generates a number of 
binaries 106 that are then evaluated by the system depicted in Fig. 1 described 



above. 



Each optimization setting is different than another, and may be defined 
based on the user's knowledge of what each optimization setting is expected to 
accomplish in a general sense (in terms of the associated binary being more 
suitable for a given field of use) . With the help of the evaluator and bmary 
selector of Fig. 1, the user can in effect request optimizations that traditionally 
would not be allowed to be performed simultaneously by a conventional tool. 
For instance, there may be a combination optimization setting that specifies a 
compiler control that is expected to generate faster but more voluminous code, 
combined with a control that is expected to produce smaller code. Using the 
systematic approach of the evaluator and binary selector, the net effect of 
several of such optimization settings are evaluated and compared (through the 
FOM mechanism described above) to find the "best" one. This systematic 
process helps remove some of the guess work that may otherwise be 
unavoidable while trying to find the best optimization setting for a particular 
field of use. 

Turning now of Fig. 3, another embodiment of the invention is depicted 
where the binaries 106 are generated by a binary rewriter 304 based on the 
same, "initial" binary 302. The binary rewriter 304 may be a conventional, 
binary rewriting tool, e.g., a static binary translator with both its input binary 
and its output binary targeting the same instruction set architecture. The 
binary rewriter 304 exposes optimization controls to the user that, in this 
embodiment, are received from a script processor 308 that has read the user's 
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range of the characteristic being measured (e.g., the long^^^rtes.t.i^n 
times expected for any given input binary) and a defined range for the FOM. 

In operation 408, the current overall FOM is compared with a prior 
overall FOM. The latter is an overall FOM that may have been previously 
computed and that is associated with a prior version of the binary. For the 
initial pass, there may be no prior computed FOM, such that operation 408 may 
be skipped. 

After the comparison in operation 408, if there are further optimization 
settings to be evaluated (operation 412) then the process cycles, with another 
optimization setting. This time, operation 408 is performed since there is a 
prior overall FOM available now. Note that in subsequent cycles, operation 
408 may involve multiple comparisons between the current overall FOM and 
each of several prior overall FOMs that are stored. 

Once the last optimization setting has been evaluated, the loop is exited 
at operation 412, and the process proceeds with either operation 414 and/or 
operation 416. In the former, the system indicates to the user which version of 
the input binaries has the highest or lowest ("best") overall FOM, as 
determined from the comparisons, that were made in the iterative process. 
Note that the system may be designed such that only the binary that has the 
highest or lowest overall FOM value at any given point in the iterative process 
is saved (thereby helping conserve memory resources). 

In addition to, or as an alternative to, operation 414, there is operation 
416 in which the system can display to the user a ranking of the different 
binary versions, in accordance with their overall FOMs, e.g. from highest to 
lowest. This embodiment of the invention allows the user to quickly determine 
how "far apart" the different versions of the binaries are from each other in 
view of the respective optimization settings used to generate them. Other ways 



of displaying the results of the comparison performed by the binary selector 
are possible. 

The flow diagram of Fig. 4 also shows operation 406 which is performed 
in situations where there is at least one further characteristic that is to be 
measured for each input binary. In that case, the FOM comparisons of 
operation 408 may be carried out as vector operations. For example, an 
overall FOM can be computed as a function of all of the FOMs for the current 
version, as in Equation 3 above. This value is then compared with a prior, 
overall FOM. The latter is computed as a function of all of the FOMs of a prior 
version. 

Turning now to Fig. 5, a block diagram of a full featured system for code 
generation, evaluation, and selection, in accordance with an embodiment of the 
invention is shown. In contrast to the embodiments described above in Figs. 2 
and 3, the code generator in this case includes all three code generation 
components, namely, compiler 204, linker 206, and binary rewriter 304. Each 
optimization setting in this instance may include controls for any one or all of 
the compiler, linker, and binary rewriter. Each new binary image at the output 
of binary rewriter 304 is fed to each one of a number of cost evaluators 508. 
The cost evaluator 508 computes a cost, as a function of a measured 
characteristic of the binary. Here, "cost" is not limited to a monetary amount 
that is to be paid or charged. Rather, it is used more generally to refer to any 
outlay or expenditure (as an effort or sacrifice) made to achieve an object. 
Alternatively, cost represents a loss or penalty that is incurred by the measured 
characteristic of the binary. Under this approach for the FOM, the binary that 
is associated with the lowest cost, or the lowest overall cost in the case where 
multiple factors are to be taken into consideration, becomes the best binary. 
This is determined by the binary selector 510 which compares the computed 
costs and selects the binary having the lowest overaU cost. An example pseudo 



code that describes the framework of cost-drive,, code generation through an 
iterative exploration process (based on the concept of Fig. 5) is given below, 
for (each optimization combination control of the compiler) 

' Compile the source codes with the ^.'fflgg*'^ "** 
for (each optimization combination control of the lmker) 

' l ink the orosram PI using the current linking optimization control; 

' Rewrite the program PI using the current optimization control 
for *e biiryfewriting generate new binary mage P2; 
^rbt^onUeenPzand^eoldbinarvirnage 
kept in the Binary Selector; 

) 

system 

The above described pseudo code thus provides the most optimized 
(towest cost) code, by rewriting the binary multiple times (each time using a 
different optimization setting) in the inner loop, and then recompiling the 
source program and relinking the recompiled object files (outer loop). 

Integrating the compiler, linker, and the binary rewriter in the manner 
described abovebrings additionai capabilities for code optimization Also me 
system flexibility of bundfing together several evaluators gives the general 
framework the ability to take additional factors into consideration when 
selecting the best binary image. The system also provides a framework to 
better shady the correlation between a particulax optimization control and the 
cos. impUcarions that are brought as a result into the binary image. 

Tuzning now to Fig. 6. a block diagram of a computer is shown, on 
„Wchasoftwaretool,maccordancewimanerr*odimen,ofthemvention,can 

ro to perform the processes described above. The computer has a processor 
604 that is an example of a machine that can execute instructions stored in mam 
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j^^ihf^ above The main memory 
memory 606, to perform the operanons described 

e0 7 andontop of which wmrunacode generator programfrO,,^ a, rsK* 
U6andhinaryse.ee.or UO. The code generator program 609 maymdude one 
116, ana 01 y described above such as a compiler, 

or more of the code generator components described 
Hnker andbinaryrewriter. These programs may also be provided in other 
Z fmachJreadablemedia. The resultsof execu ting progtamstn main 

ofmputbinariesbytbebinaryselectorllO.onauser displays!^ Tn 

" Lor 604 accesses the user disp^y 616 mrough an I/O interface 610, and a 

« controllers, M *VO^°*~*~*£Z. • 
and memory to communicate with addittonal devices, including, for instance, 

n elor k interface6 2 4overwhichmecomputercanbeusedtoaccessother 
nodes of a network. Other arrangements of a computer for runmng the 
different software tools described above are possible. 

A machine-readable medium may indude any mechanism for storingor 
.ansrnithng information (such as any one or more of the software components 

to Compact Disc Read-Ordy Memory (CD-ROMs), Read^y Memo^ROMs), 
LdolAccesaMemoryCRAM.ErasableProgramcnableRead-OmyMemory 

(EPROM), and a transmission over the Internet 

The invention is not limited to the specific embodiments described 
operations404and406mayocc U rse q uenuaUy.Ingeneral,theo^ 

operations, as they are mussed m H, 4 for example, may be chang^ 

p Lce,dependmgonanygivenimplementa a onoftheovera U process^or 

Ltance,thIdifferen,version,ofmebinarymaybeprod U cedmparaUdbyme 



same machine, using a multi-threading process. Accordingly, other 
embodiments are within the scope of the claims. 
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